Smoothing Effects of Bagging
نویسندگان
چکیده
Bagging is a device intended for reducing the prediction error of learning algorithms. In its simplest form, bagging draws bootstrap samples from the training sample, applies the learning algorithm to each bootstrap sample, and then averages the resulting prediction rules. We study the von Mises expansion of a bagged statistical functional and show that it is related to the Stein-Efron ANOVA expansion of the raw (unbagged) functional. The basic observation of the present short note is that a bagged functional is always smooth in a precise sense, even if the raw functional is rough or unstable. AT&T Labs–Research, 180 Park Ave, Florham Park, NJ 07932-0971; [email protected] Department of Statistics, University of Washington, Seattle, WA 98195-4322; [email protected]. Research partially supported by NSF grant DMS 9803226. This work was performed while the second author was on sabbatical leave at AT&T Labs. 1 Notations, Definitions and Assumptions for Bagging Statistical Functionals We need some standard notations and assumptions in order to define bagging for statistics and, more generally, for statistical functionals. Let θ be a real-valued statistical functional θ(F ) : P → IR defined on a subset P of the probability measures on a given sample space. By assumption all empirical distributions Fn = 1 n ∑n i=1 δxi are contained in P. If θ is evaluated at an empiricial measure, it specializes to a statistic which we write as θ(Fn) = θ(x1, . . . , xn). This is a permutation symmetric function of the n sample points. We will repeatedly need expectations of random variables θ(X1, . . . , Xn) where X1, . . . , Xn are i.i.d. according to some F : E F θ(X1, . . . , Xn) = ∫ θ(x1, . . . , xn) dF (x1) · · ·dF (xn). Following Breiman (1996), we define bagging of a statistic θ(Fn) as the average over bootstrap samples X 1 , . . . , X ∗ n drawn i.i.d. from Fn: θ(Fn) = E Fn θ(X ∗ 1 , . . . , X ∗ n) . For our purposes we need to generalize the notion of bagging to statistical functionals θ(F ). A natural extension is θ n (F ) = E F θ(X ∗ 1 , . . . , X ∗ n) , where the random variables X 1 , . . . , X ∗ n are i.i.d. F , and their number n is merely a parameter of the bagging procedure. Unlike for an empirical distribution of an actual sample, for a general probability measure F there is no notion of sample size. The variables X i should still be thought of as bootstrap samples, albeit drawn from an “infinite population”. Since n now denotes a parameter of the bagging procedure, we need to distinguish it from the size N of actual data x1, . . . , xN (compare Friedman and Hall 2000). If one models the data as i.i.d. samples from F , one estimates F with the empirical FN = 1 N ∑N i=1 δxi. The functional θ(F ) is then estimated by plug-in with the statistic θ(FN): θ̂(F ) = θ(FN) . The bagged functional θ n (F ) in turn is estimated with the plug-in estimator θ B n (FN): θ̂B n (F ) = θ B n (FN ) = E FN θ(X ∗ 1 , . . . , X ∗ n) . The idea of bagging is to smooth θ, with the number n playing the role of a smoothing parameter. It is not a priori clear, though, whether more smoothing occurs for small n or large n. Here is an intuition that proves to be correct: bagging averages over empiricals Fn, hence more smoothing occurs when Fn is allowed to roam further from 1 F , effectively using a larger neighborhood (“bandwidth”) around F ; due to Fn → F as n → ∞, Fn is on the average closer to F for large n, hence the “bandwidth” is larger for small n. The calculations below verify that this is so, but curiously the reason has nothing to do with Fn being close to, or far from, F : it turns out that the von Mises expansion of an n-bagged functional is finite of length n; because the von Mises expansion is essentially a Taylor expansion, the n-bagged functional is smoother if the expansion is shorter, that is, if n is smaller. The above definition of a bagged statistical functional has a blind spot: It would be interesting to consider both bootstrap sampling with replacement (conventional) and bootstrap sampling without replacement where n strictly smaller than N (as in Friedman and Hall (2000) and Buhlmann and Yu (2000)). If bootstrap is extended to infinite populations, however, the difference between sampling with and without replacement disappears. Thus, in order to capture both modes of sampling, one is limited to finite populations and correspondingly to statistics as opposed to statistical functionals. If bagging is smoothing by averaging over nearby empirical distributions, one may wonder whether other types of bagging could be conceived. In fact, one can more generally define a smoothed version θ of θ by θ(F ) = ave (θ(G) |G ∈ N(F )) where N(F ) is some sort of neighborhood of F , and ave denotes some way of averaging. This suggests a number of generalizations of bagging, for example by varying the neighborhoods and the meaning of ave . In the present note, however, we remain with Breiman’s original version of bagging and pursue some implications of averaging over empirical distributions. 2 Preliminaries 1: The von Mises Expansion of a Statistical Functional The von Mises expansion of a functional θ around a distribution F is an expansion of the form θ(G) = θ(F ) + ∫ ψ1(x) d(G− F )(x) + 1 2 ∫ ψ2(x1, x2) d(G− F ) ⊗ 2 + · · · = θ(F ) + ∞ ∑
منابع مشابه
Bagging Down-Weights Leverage Points
Bagging is a procedure averaging estimators trained on bootstrap samples. Numerous experiments have shown that bagged estimates often yield better results than the original predictor, and several explanations have been given to account for this gain. However, six years from its introduction, bagging is still not fully understood. Most explanations given until now are based on global properties ...
متن کاملSmoothing Effects of Bagging: Von Mises Expansions of Bagged Statistical Functionals
Bagging is a device intended for reducing the prediction error of learning algorithms. In its simplest form, bagging draws bootstrap samples from the training sample, applies the learning algorithm to each bootstrap sample, and then averages the resulting prediction rules. We extend the definition of bagging from statistics to statistical functionals and study the von Mises expansion of bagged ...
متن کاملBagging Binary and Quantile Predictors for Time Series: Further Issues
Bagging (bootstrap aggregating) is a smoothing method to improve predictive ability under the presence of parameter estimation uncertainty and model uncertainty. In Lee and Yang (2006), we examined how (equal-weighted and BMA-weighted) bagging works for onestep ahead binary prediction with an asymmetric cost function for time series, where we considered simple cases with particular choices of a...
متن کاملBagging Constrained Equity Premium Predictors∗
The literature on excess return prediction has considered a wide array of estimation schemes, among them unrestricted and restricted regression coefficients. We consider bootstrap aggregation (bagging) to smooth parameter restrictions. Two types of restrictions are considered: positivity of the regression coefficient and positivity of the forecast. Bagging constrained estimators can have smalle...
متن کاملOn the stability of support vector machines for face detection
In this paper we study the stability of support vector machines in face detection by decomposing their average prediction error into the bias, variance, and aggregation effect terms. Such an analysis indicates whether bagging, a method for generating multiple versions of a classifier from bootstrap samples of a training set, and combining their outcomes by majority voting, is expected to improv...
متن کامل